Add vLLM multi-instance CPU benchmark skill by rjauhari2 · Pull Request #81 · amd/skills

rjauhari2 · 2026-06-30T08:48:49Z

What

Adds vllm-multiinstance: benchmarks a vLLM CPU image on AMD EPYC by running N vLLM instances (each pinned to 32 physical cores) behind an NGINX load balancer, driving load with guidellm via ansible, and reporting peak aggregate memory (podman stats) plus end-to-end throughput/latency across models, concurrency rates, and instance counts. The benchmark harness is vendored with the skill; only podman + ansible are required.

It is robust on leaner hosts:

A host preflight (check-host.sh) fails fast with actionable guidance on unresolvable image short-names, missing rootless cgroup cpuset delegation, and CNI cniVersion mismatch.
start.sh auto-downgrades the CNI conflist, guards static-IP fallback, and fast-fails dead containers instead of hanging the health wait.
run_sweep.sh auto-detects missing passwordless sudo and runs ansible/guidellm rootless (ansible_become=false).

Testing

Structural gate (.github/scripts/check.sh) passes with 0 errors and reports the Cursor marketplace manifest is up to date.
Behavioral eval (LLM-judged, sonnet) 5/5 passed -- covers sweep sizing, the guidellm.log-vs-benchmarks.json score rule, host-preflight fail-fast, image short-name remediation, and rootless/become auto-detection.

Made with Cursor

Adds `vllm-multiinstance`: benchmarks a vLLM CPU image on AMD EPYC by running N vLLM instances (each pinned to 32 physical cores) behind an NGINX load balancer, driving load with guidellm via ansible, and reporting peak aggregate memory (podman stats) + end-to-end throughput/latency across models, concurrency rates, and instance counts. The benchmark harness is vendored with the skill; only podman + ansible are required. Robust on leaner hosts: a host preflight (check-host.sh) fails fast with actionable guidance on unresolvable image short-names, missing rootless cgroup cpuset delegation, and CNI cniVersion mismatch; start.sh auto-downgrades the CNI conflist, guards static-IP fallback, and fast-fails dead containers instead of hanging the health wait; run_sweep.sh auto-detects missing passwordless sudo and runs ansible/guidellm rootless (ansible_become=false). Contents: SKILL.md, skill-card.md, README.md, reference.md, vendored harness/, scripts/; registered in .claude-plugin/marketplace.json (+ regenerated .cursor-plugin manifest); behavioral eval at eval/behavioral/tests/test_vllm_multiinstance.py. Testing: structural gate (check.sh) passes with 0 errors; behavioral eval (LLM-judged, sonnet) 5/5 passed -- covers sweep sizing, the guidellm.log-vs-benchmarks.json score rule, host-preflight fail-fast, image short-name remediation, and rootless/become auto-detection. Signed-off-by: Rahul Jauhari <rahul.jauhari@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com> Change-Id: Ifa419ea79793fcfb303b6f1cc657539b22622f8b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add vLLM multi-instance CPU benchmark skill#81

Add vLLM multi-instance CPU benchmark skill#81
rjauhari2 wants to merge 1 commit into
amd:mainfrom
rjauhari2:add-vllm-multiinstance-skill

rjauhari2 commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

rjauhari2 commented Jun 30, 2026

What

Contents

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant